{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "c6c03e23",
   "metadata": {},
   "source": [
    "# Serve Llama2-70B on SageMaker using DJL container"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3a968267",
   "metadata": {},
   "source": [
    "In this notebook, we deploy the Llama-2-70B model across GPUs on a ml.p4d.24xlarge instance."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eb63b6c1",
   "metadata": {},
   "source": [
    "## Import the relevant libraries and configure several global variables using boto3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a78aa656-a254-4037-8736-2c3ab0a9ef7e",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "!pip install sagemaker boto3 --upgrade"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "456e483a",
   "metadata": {},
   "outputs": [],
   "source": [
    "import sagemaker\n",
    "import jinja2\n",
    "from sagemaker import image_uris\n",
    "import boto3\n",
    "import os\n",
    "import time\n",
    "import json\n",
    "from pathlib import Path"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1867d693",
   "metadata": {},
   "outputs": [],
   "source": [
    "role = sagemaker.get_execution_role()  # execution role for the endpoint\n",
    "sess = sagemaker.session.Session()  # sagemaker session for interacting with different AWS APIs\n",
    "bucket = sess.default_bucket()  # bucket to house artifacts\n",
    "model_bucket = sess.default_bucket()  # bucket to house artifacts\n",
    "s3_code_prefix = \"hf-large-model-djl/llama2\"  # folder within bucket where code artifact will go\n",
    "\n",
    "region = sess._region_name\n",
    "account_id = sess.account_id()\n",
    "\n",
    "s3_client = boto3.client(\"s3\")\n",
    "sm_client = boto3.client(\"sagemaker\")\n",
    "smr_client = boto3.client(\"sagemaker-runtime\")\n",
    "\n",
    "jinja_env = jinja2.Environment()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3b6c6a87",
   "metadata": {},
   "source": [
    "## Create SageMaker compatible Model artifact, upload model to S3 and bring your own inference script.\n",
    "\n",
    "SageMaker Large Model Inference containers can be used to host models without providing your own inference code. This is extremely useful when there is no custom pre-processing of the input data or postprocessing of the model's predictions.\n",
    "\n",
    "SageMaker needs the model artifacts to be in a Tarball format. In this example, we provide the following files - `serving.properties`.\n",
    "\n",
    "The tarball is in the following format:\n",
    "\n",
    "```\n",
    "code\n",
    "├──── \n",
    "│   └── serving.properties\n",
    "\n",
    "```\n",
    "\n",
    "- `serving.properties` is the configuration file that can be used to configure the model server."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7d037781",
   "metadata": {},
   "outputs": [],
   "source": [
    "!mkdir -p llama2"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5a507efc",
   "metadata": {},
   "source": [
    "#### Create serving.properties \n",
    "This is a configuration file to indicate to DJL Serving which model parallelization and inference optimization libraries you would like to use. Depending on your need, you can set the appropriate configuration.\n",
    "\n",
    "Here is a list of settings that we use in this configuration file -\n",
    "- `engine`: The engine for DJL to use. In this case, we intend to use Accelerate and hence set it to **Python**. \n",
    "- `option.model_id`: The model id of a pretrained model hosted inside a model repository on huggingface.co (https://huggingface.co/models). The container uses this model id to download the corresponding model repository on huggingface.co.\n",
    "- `option.tensor_parallel_degree`: Set to the number of GPU devices over which Accelerate needs to partition the model. This parameter also controls the no of workers per model which will be started up when DJL serving runs. As an example if we have a 8 GPU machine and we are creating 8 partitions then we will have 1 worker per model to serve the requests.\n",
    "\n",
    "For more details on the configuration options and an exhaustive list, you can refer the documentation - https://docs.aws.amazon.com/sagemaker/latest/dg/realtime-endpoints-large-model-configuration.html."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e18abd82",
   "metadata": {},
   "outputs": [],
   "source": [
    "# define a variable to contain the s3url of the location that has the model\n",
    "pretrained_model_location = f\"s3://sagemaker-example-files-prod-{region}/models/llama2/\"\n",
    "print(f\"Pretrained model will be downloaded from ---- > {pretrained_model_location}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f3a1437b",
   "metadata": {},
   "source": [
    "In the below cell, we leverage [Jinja](https://pypi.org/project/Jinja2/) to create a template for serving.properties. Specifically, we parameterize `option.s3url` so that it can be changed based on the pretrained model location."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "786a02ed",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile ./llama2/serving.properties\n",
    "engine=MPI\n",
    "option.model_id=TheBloke/Llama-2-70B-fp16\n",
    "option.tensor_parallel_degree=8\n",
    "option.dtype=fp16\n",
    "option.rolling_batch=auto\n",
    "option.max_rolling_batch_size=4\n",
    "option.model_loading_timeout=3600"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "03d9203a",
   "metadata": {},
   "outputs": [],
   "source": [
    "# we plug in the appropriate model location into our `serving.properties` file based on the region in which this notebook is running\n",
    "template = jinja_env.from_string(Path(\"llama2/serving.properties\").open().read())\n",
    "Path(\"llama2/serving.properties\").open(\"w\").write(\n",
    "    template.render(s3url=pretrained_model_location)\n",
    ")\n",
    "!pygmentize llama2/serving.properties | cat -n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a5516aca",
   "metadata": {},
   "source": [
    "**Image URI for the DJL container is being used here**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "36729749",
   "metadata": {},
   "outputs": [],
   "source": [
    "# inference_image_uri = f\"{account_id}.dkr.ecr.{region}.amazonaws.com/djl-ds:latest\"\n",
    "inference_image_uri = image_uris.retrieve(\n",
    "    framework=\"djl-deepspeed\",\n",
    "    region=sess.boto_session.region_name,\n",
    "    version=\"0.27.0\"\n",
    ")\n",
    "print(f\"Image going to be used is ---- > {inference_image_uri}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c2ff7d02",
   "metadata": {},
   "source": [
    "**Create the Tarball and then upload to S3 location**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "313ef1ca",
   "metadata": {},
   "outputs": [],
   "source": [
    "!rm -f model.tar.gz\n",
    "!tar czvf model.tar.gz -C llama2 ."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c9a634a8",
   "metadata": {},
   "outputs": [],
   "source": [
    "s3_code_artifact = sess.upload_data(\"model.tar.gz\", bucket, s3_code_prefix)\n",
    "print(f\"S3 Code or Model tar ball uploaded to --- > {s3_code_artifact}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d86bc297",
   "metadata": {},
   "source": [
    "### To create the end point the steps are:\n",
    "\n",
    "1. Create the Model using the Image container and the Model Tarball uploaded earlier\n",
    "2. Create the endpoint config using the following key parameters\n",
    "\n",
    "    a) Instance Type is ml.p4d.24xlarge\n",
    "\n",
    "    b) ContainerStartupHealthCheckTimeoutInSeconds is 3600 to ensure health check starts after the model is ready    \n",
    "\n",
    "\n",
    "3. Create the end point using the endpoint config created"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "736e245f",
   "metadata": {},
   "source": [
    "#### Create the Model\n",
    "Use the image URI for the DJL container and the s3 location to which the tarball was uploaded.\n",
    "\n",
    "The container downloads the model into the `/tmp` space on the container because SageMaker maps the `/tmp` to the Amazon Elastic Block Store (Amazon EBS) volume that is mounted when we specify the endpoint creation parameter VolumeSizeInGB.\n",
    "\n",
    "For instances like p4dn, which come pre-built with the volume instance, we can continue to leverage the `/tmp` on the container. The size of this mount is large enough to hold the model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "69fb1813",
   "metadata": {},
   "outputs": [],
   "source": [
    "from sagemaker.utils import name_from_base\n",
    "\n",
    "model_name = name_from_base(f\"llama2-djl\")\n",
    "print(model_name)\n",
    "\n",
    "create_model_response = sm_client.create_model(\n",
    "    ModelName=model_name,\n",
    "    ExecutionRoleArn=role,\n",
    "    PrimaryContainer={\"Image\": inference_image_uri,\n",
    "                      \"ModelDataUrl\": s3_code_artifact,\n",
    "                      'Environment': {'MODEL_LOADING_TIMEOUT': '3600'}\n",
    "                     }\n",
    ")\n",
    "model_arn = create_model_response[\"ModelArn\"]\n",
    "\n",
    "print(f\"Created Model: {model_arn}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fd78601e",
   "metadata": {},
   "outputs": [],
   "source": [
    "endpoint_config_name = f\"{model_name}-config\"\n",
    "endpoint_name = f\"{model_name}-endpoint\"\n",
    "\n",
    "endpoint_config_response = sm_client.create_endpoint_config(\n",
    "    EndpointConfigName=endpoint_config_name,\n",
    "    ProductionVariants=[\n",
    "        {\n",
    "            \"VariantName\": \"variant1\",\n",
    "            \"ModelName\": model_name,\n",
    "            \"InstanceType\": \"ml.p4d.24xlarge\",\n",
    "            \"InitialInstanceCount\": 1,\n",
    "            \"ModelDataDownloadTimeoutInSeconds\": 3600,\n",
    "            \"ContainerStartupHealthCheckTimeoutInSeconds\": 3600,\n",
    "        },\n",
    "    ],\n",
    ")\n",
    "endpoint_config_response"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4bf1662a",
   "metadata": {},
   "outputs": [],
   "source": [
    "create_endpoint_response = sm_client.create_endpoint(\n",
    "    EndpointName=f\"{endpoint_name}\", EndpointConfigName=endpoint_config_name\n",
    ")\n",
    "print(f\"Created Endpoint: {create_endpoint_response['EndpointArn']}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e76ed7c9",
   "metadata": {},
   "source": [
    "### This step can take ~ 20 min or longer so please be patient"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2a346828",
   "metadata": {},
   "outputs": [],
   "source": [
    "import time\n",
    "\n",
    "resp = sm_client.describe_endpoint(EndpointName=endpoint_name)\n",
    "status = resp[\"EndpointStatus\"]\n",
    "print(\"Status: \" + status)\n",
    "\n",
    "while status == \"Creating\":\n",
    "    time.sleep(60)\n",
    "    resp = sm_client.describe_endpoint(EndpointName=endpoint_name)\n",
    "    status = resp[\"EndpointStatus\"]\n",
    "    print(\"Status: \" + status)\n",
    "\n",
    "print(\"Arn: \" + resp[\"EndpointArn\"])\n",
    "print(\"Status: \" + status)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "80638bee",
   "metadata": {},
   "source": [
    "#### Leverage Boto3 to invoke the endpoint. \n",
    "\n",
    "This is a generative model so we pass in a Text as a prompt and Model will complete the sentence and return the results.\n",
    "\n",
    "You can pass a prompt as input to the model. This done by setting `inputs` to a prompt. The model then returns a result for each prompt. The text generation can be configured using appropriate parameters. These `parameters` need to be passed to the endpoint as a dictionary of `kwargs`. Refer this documentation - https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationConfig for more details.\n",
    "\n",
    "The below code sample illustrates the invocation of the endpoint using a text prompt and also sets some parameters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d799ff95",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "prompt = \"Amazon.com is the best\"\n",
    "response_model = smr_client.invoke_endpoint(\n",
    "    EndpointName=endpoint_name,\n",
    "    Body=json.dumps(\n",
    "        {\n",
    "            \"inputs\": prompt,\n",
    "            \"parameters\": {\n",
    "                \"max_new_tokens\": 50,\n",
    "                \"do_sample\": True,\n",
    "            },\n",
    "        }\n",
    "    ),\n",
    "    ContentType=\"application/json\",\n",
    ")\n",
    "\n",
    "response_model[\"Body\"].read().decode(\"utf8\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8c5a7651",
   "metadata": {},
   "source": [
    "## Conclusion\n",
    "In this post, we demonstrated how to use SageMaker large model inference containers to host Llama2-70B. For more details about Amazon SageMaker and its large model inference capabilities, refer to the following:\n",
    "\n",
    "* Model parallelism and large model inference on Sagemaker (https://docs.aws.amazon.com/sagemaker/latest/dg/realtime-endpoints-large-model-inference.html)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "03a86317",
   "metadata": {},
   "source": [
    "## Clean Up"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2defcfef",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Delete the endpoint\n",
    "sm_client.delete_endpoint(EndpointName=endpoint_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "28278159",
   "metadata": {},
   "outputs": [],
   "source": [
    "# In case the endpoint failed we still want to delete the model\n",
    "sm_client.delete_endpoint_config(EndpointConfigName=endpoint_config_name)\n",
    "sm_client.delete_model(ModelName=model_name)"
   ]
  }
 ],
 "metadata": {
  "availableInstances": [
   {
    "_defaultOrder": 0,
    "_isFastLaunch": true,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 4,
    "name": "ml.t3.medium",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 1,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.t3.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 2,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.t3.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 3,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.t3.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 4,
    "_isFastLaunch": true,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.m5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 5,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.m5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 6,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.m5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 7,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.m5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 8,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.m5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 9,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.m5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 10,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.m5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 11,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.m5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 12,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.m5d.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 13,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.m5d.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 14,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.m5d.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 15,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.m5d.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 16,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.m5d.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 17,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.m5d.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 18,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.m5d.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 19,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.m5d.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 20,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": true,
    "memoryGiB": 0,
    "name": "ml.geospatial.interactive",
    "supportedImageNames": [
     "sagemaker-geospatial-v1-0"
    ],
    "vcpuNum": 0
   },
   {
    "_defaultOrder": 21,
    "_isFastLaunch": true,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 4,
    "name": "ml.c5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 22,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.c5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 23,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.c5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 24,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.c5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 25,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 72,
    "name": "ml.c5.9xlarge",
    "vcpuNum": 36
   },
   {
    "_defaultOrder": 26,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 96,
    "name": "ml.c5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 27,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 144,
    "name": "ml.c5.18xlarge",
    "vcpuNum": 72
   },
   {
    "_defaultOrder": 28,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.c5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 29,
    "_isFastLaunch": true,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.g4dn.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 30,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.g4dn.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 31,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.g4dn.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 32,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.g4dn.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 33,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.g4dn.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 34,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.g4dn.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 35,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 61,
    "name": "ml.p3.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 36,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 244,
    "name": "ml.p3.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 37,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 488,
    "name": "ml.p3.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 38,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 768,
    "name": "ml.p3dn.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 39,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.r5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 40,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.r5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 41,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.r5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 42,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.r5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 43,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.r5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 44,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.r5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 45,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 512,
    "name": "ml.r5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 46,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 768,
    "name": "ml.r5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 47,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.p4d.24xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 48,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.p4d.24xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 49,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.p4d.24xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 50,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.p4d.24xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 51,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.p4d.24xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 52,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.p4d.24xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 53,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.p4d.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 54,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 768,
    "name": "ml.p4d.24xlarge",
    "vcpuNum": 192
   },
   {
    "_defaultOrder": 55,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 1152,
    "name": "ml.p4d.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 56,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 1152,
    "name": "ml.p4de.24xlarge",
    "vcpuNum": 96
   }
  ],
  "kernelspec": {
   "display_name": "conda_python3",
   "language": "python",
   "name": "conda_python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.13"
  },
  "vscode": {
   "interpreter": {
    "hash": "916dbcbb3f70747c44a77c7bcd40155683ae19c65e1c03b4aa3499c5328201f1"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
